Datascience para el Bien Social
  • Home
  • Categories
  • Tags
  • Archives

Encuestas Star Wars. ¿Que versiones de la saga son mas populares y entre quienes?

Encuestas estar wars. ¿Que versiones de la saga son mas populares y entre quienes?¶

En este proyecto vamos, en primer lugar a limpiar los datos para poder trabajar con ellos, despues veremos que sagas son las mas populares y por quienes.¶

In [5]:
import pandas as pd
star_wars = pd.read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/star-wars-survey/StarWars.csv", encoding="ISO-8859-1")
In [6]:
print(star_wars.head(3))
   RespondentID Have you seen any of the 6 films in the Star Wars franchise?  \
0           NaN                                           Response             
1  3.292880e+09                                                Yes             
2  3.292880e+09                                                 No             

  Do you consider yourself to be a fan of the Star Wars film franchise?  \
0                                           Response                      
1                                                Yes                      
2                                                NaN                      

  Which of the following Star Wars films have you seen? Please select all that apply.  \
0           Star Wars: Episode I  The Phantom Menace                                    
1           Star Wars: Episode I  The Phantom Menace                                    
2                                                NaN                                    

                                    Unnamed: 4  \
0  Star Wars: Episode II  Attack of the Clones   
1  Star Wars: Episode II  Attack of the Clones   
2                                          NaN   

                                    Unnamed: 5  \
0  Star Wars: Episode III  Revenge of the Sith   
1  Star Wars: Episode III  Revenge of the Sith   
2                                          NaN   

                          Unnamed: 6  \
0  Star Wars: Episode IV  A New Hope   
1  Star Wars: Episode IV  A New Hope   
2                                NaN   

                                     Unnamed: 7  \
0  Star Wars: Episode V The Empire Strikes Back   
1  Star Wars: Episode V The Empire Strikes Back   
2                                           NaN   

                                 Unnamed: 8  \
0  Star Wars: Episode VI Return of the Jedi   
1  Star Wars: Episode VI Return of the Jedi   
2                                       NaN   

  Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.  \
0           Star Wars: Episode I  The Phantom Menace                                                                                              
1                                                  3                                                                                              
2                                                NaN                                                                                              

            ...                Unnamed: 28       Which character shot first?  \
0           ...                       Yoda                          Response   
1           ...             Very favorably  I don't understand this question   
2           ...                        NaN                               NaN   

  Are you familiar with the Expanded Universe?  \
0                                     Response   
1                                          Yes   
2                                          NaN   

  Do you consider yourself to be a fan of the Expanded Universe?ξ  \
0                                           Response                 
1                                                 No                 
2                                                NaN                 

  Do you consider yourself to be a fan of the Star Trek franchise?    Gender  \
0                                           Response                Response   
1                                                 No                    Male   
2                                                Yes                    Male   

        Age Household Income           Education Location (Census Region)  
0  Response         Response            Response                 Response  
1     18-29              NaN  High school degree           South Atlantic  
2     18-29     $0 - $24,999     Bachelor degree       West South Central  

[3 rows x 38 columns]
In [3]:
star_wars.columns
Out[3]:
Index(['RespondentID',
       'Have you seen any of the 6 films in the Star Wars franchise?',
       'Do you consider yourself to be a fan of the Star Wars film franchise?',
       'Which of the following Star Wars films have you seen? Please select all that apply.',
       'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',
       'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',
       'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',
       'Unnamed: 14',
       'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',
       'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
       'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',
       'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
       'Unnamed: 28', 'Which character shot first?',
       'Are you familiar with the Expanded Universe?',
       'Do you consider yourself to be a fan of the Expanded Universe?ξ',
       'Do you consider yourself to be a fan of the Star Trek franchise?',
       'Gender', 'Age', 'Household Income', 'Education',
       'Location (Census Region)'],
      dtype='object')
In [7]:
star_wars = star_wars[pd.notnull(star_wars['RespondentID'])]
print(star_wars.head(3))
   RespondentID Have you seen any of the 6 films in the Star Wars franchise?  \
1  3.292880e+09                                                Yes             
2  3.292880e+09                                                 No             
3  3.292765e+09                                                Yes             

  Do you consider yourself to be a fan of the Star Wars film franchise?  \
1                                                Yes                      
2                                                NaN                      
3                                                 No                      

  Which of the following Star Wars films have you seen? Please select all that apply.  \
1           Star Wars: Episode I  The Phantom Menace                                    
2                                                NaN                                    
3           Star Wars: Episode I  The Phantom Menace                                    

                                    Unnamed: 4  \
1  Star Wars: Episode II  Attack of the Clones   
2                                          NaN   
3  Star Wars: Episode II  Attack of the Clones   

                                    Unnamed: 5  \
1  Star Wars: Episode III  Revenge of the Sith   
2                                          NaN   
3  Star Wars: Episode III  Revenge of the Sith   

                          Unnamed: 6  \
1  Star Wars: Episode IV  A New Hope   
2                                NaN   
3                                NaN   

                                     Unnamed: 7  \
1  Star Wars: Episode V The Empire Strikes Back   
2                                           NaN   
3                                           NaN   

                                 Unnamed: 8  \
1  Star Wars: Episode VI Return of the Jedi   
2                                       NaN   
3                                       NaN   

  Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.  \
1                                                  3                                                                                              
2                                                NaN                                                                                              
3                                                  1                                                                                              

            ...                  Unnamed: 28  \
1           ...               Very favorably   
2           ...                          NaN   
3           ...             Unfamiliar (N/A)   

        Which character shot first?  \
1  I don't understand this question   
2                               NaN   
3  I don't understand this question   

  Are you familiar with the Expanded Universe?  \
1                                          Yes   
2                                          NaN   
3                                           No   

  Do you consider yourself to be a fan of the Expanded Universe?ξ  \
1                                                 No                 
2                                                NaN                 
3                                                NaN                 

  Do you consider yourself to be a fan of the Star Trek franchise? Gender  \
1                                                 No                 Male   
2                                                Yes                 Male   
3                                                 No                 Male   

     Age Household Income           Education Location (Census Region)  
1  18-29              NaN  High school degree           South Atlantic  
2  18-29     $0 - $24,999     Bachelor degree       West South Central  
3  18-29     $0 - $24,999  High school degree       West North Central  

[3 rows x 38 columns]
  • Now we have all our dataframe with only rows with a valid ResponseID

Limpiando y mapeando una columna binaria Si/No ( Yes/No)¶

In [5]:
yes_no = {'Yes': True, "No": False}

have_you_see = 'Have you seen any of the 6 films in the Star Wars franchise?'
do_you_consider = 'Do you consider yourself to be a fan of the Star Wars film franchise?'

star_wars[have_you_see] = star_wars[have_you_see].map(yes_no)
star_wars[do_you_consider] = star_wars[do_you_consider].map(yes_no)
In [8]:
print(star_wars.head(3))
   RespondentID Have you seen any of the 6 films in the Star Wars franchise?  \
1  3.292880e+09                                                Yes             
2  3.292880e+09                                                 No             
3  3.292765e+09                                                Yes             

  Do you consider yourself to be a fan of the Star Wars film franchise?  \
1                                                Yes                      
2                                                NaN                      
3                                                 No                      

  Which of the following Star Wars films have you seen? Please select all that apply.  \
1           Star Wars: Episode I  The Phantom Menace                                    
2                                                NaN                                    
3           Star Wars: Episode I  The Phantom Menace                                    

                                    Unnamed: 4  \
1  Star Wars: Episode II  Attack of the Clones   
2                                          NaN   
3  Star Wars: Episode II  Attack of the Clones   

                                    Unnamed: 5  \
1  Star Wars: Episode III  Revenge of the Sith   
2                                          NaN   
3  Star Wars: Episode III  Revenge of the Sith   

                          Unnamed: 6  \
1  Star Wars: Episode IV  A New Hope   
2                                NaN   
3                                NaN   

                                     Unnamed: 7  \
1  Star Wars: Episode V The Empire Strikes Back   
2                                           NaN   
3                                           NaN   

                                 Unnamed: 8  \
1  Star Wars: Episode VI Return of the Jedi   
2                                       NaN   
3                                       NaN   

  Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.  \
1                                                  3                                                                                              
2                                                NaN                                                                                              
3                                                  1                                                                                              

            ...                  Unnamed: 28  \
1           ...               Very favorably   
2           ...                          NaN   
3           ...             Unfamiliar (N/A)   

        Which character shot first?  \
1  I don't understand this question   
2                               NaN   
3  I don't understand this question   

  Are you familiar with the Expanded Universe?  \
1                                          Yes   
2                                          NaN   
3                                           No   

  Do you consider yourself to be a fan of the Expanded Universe?ξ  \
1                                                 No                 
2                                                NaN                 
3                                                NaN                 

  Do you consider yourself to be a fan of the Star Trek franchise? Gender  \
1                                                 No                 Male   
2                                                Yes                 Male   
3                                                 No                 Male   

     Age Household Income           Education Location (Census Region)  
1  18-29              NaN  High school degree           South Atlantic  
2  18-29     $0 - $24,999     Bachelor degree       West South Central  
3  18-29     $0 - $24,999  High school degree       West North Central  

[3 rows x 38 columns]

Limpiado y mapeando una columna multiopción.¶

In [9]:
import numpy as np

yes_no_episodes = {
   "Star Wars: Episode I  The Phantom Menace": True,
    np.nan: False,
    "Star Wars: Episode II  Attack of the Clones": True,
    "Star Wars: Episode III  Revenge of the Sith": True,
    "Star Wars: Episode IV  A New Hope": True,
    "Star Wars: Episode V The Empire Strikes Back": True,
    "Star Wars: Episode VI Return of the Jedi": True
}
star_wars = star_wars.rename(columns = {
    "Which of the following Star Wars films have you seen? Please select all that apply.": 'Seen 1',
    "Unnamed: 4": 'Seen 2',
    "Unnamed: 5": 'Seen 3',
    "Unnamed: 6": 'Seen 4',
    "Unnamed: 7": 'Seen 5',
    "Unnamed: 8": 'Seen 6'
})


for col in star_wars.columns[3:9]:
    star_wars[col] = star_wars[col].map(yes_no_episodes)

print(star_wars.head(3))
   RespondentID Have you seen any of the 6 films in the Star Wars franchise?  \
1  3.292880e+09                                                Yes             
2  3.292880e+09                                                 No             
3  3.292765e+09                                                Yes             

  Do you consider yourself to be a fan of the Star Wars film franchise?  \
1                                                Yes                      
2                                                NaN                      
3                                                 No                      

   Seen 1  Seen 2  Seen 3  Seen 4  Seen 5  Seen 6  \
1    True    True    True    True    True    True   
2   False   False   False   False   False   False   
3    True    True    True   False   False   False   

  Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.  \
1                                                  3                                                                                              
2                                                NaN                                                                                              
3                                                  1                                                                                              

            ...                  Unnamed: 28  \
1           ...               Very favorably   
2           ...                          NaN   
3           ...             Unfamiliar (N/A)   

        Which character shot first?  \
1  I don't understand this question   
2                               NaN   
3  I don't understand this question   

  Are you familiar with the Expanded Universe?  \
1                                          Yes   
2                                          NaN   
3                                           No   

  Do you consider yourself to be a fan of the Expanded Universe?ξ  \
1                                                 No                 
2                                                NaN                 
3                                                NaN                 

  Do you consider yourself to be a fan of the Star Trek franchise? Gender  \
1                                                 No                 Male   
2                                                Yes                 Male   
3                                                 No                 Male   

     Age Household Income           Education Location (Census Region)  
1  18-29              NaN  High school degree           South Atlantic  
2  18-29     $0 - $24,999     Bachelor degree       West South Central  
3  18-29     $0 - $24,999  High school degree       West North Central  

[3 rows x 38 columns]

Ahora como se puede apreciar tenemso que substituir los nombre de las peliculas por “True” o “False” (Verdadero o falso) asi podremos contar cuantas visualizaciones tuvo cada pelicula.¶

Limpiando la columna “Rankings”¶

In [10]:
star_wars = star_wars.rename(columns = {
    star_wars.columns[9]: "Ranking 1",
    star_wars.columns[10]: "Ranking 2",
    star_wars.columns[11]: "Ranking 3",
    star_wars.columns[12]: "Ranking 4",
    star_wars.columns[13]: "Ranking 5",
    star_wars.columns[14]: "Ranking 6"
})

print(star_wars.head(3))
   RespondentID Have you seen any of the 6 films in the Star Wars franchise?  \
1  3.292880e+09                                                Yes             
2  3.292880e+09                                                 No             
3  3.292765e+09                                                Yes             

  Do you consider yourself to be a fan of the Star Wars film franchise?  \
1                                                Yes                      
2                                                NaN                      
3                                                 No                      

   Seen 1  Seen 2  Seen 3  Seen 4  Seen 5  Seen 6 Ranking 1  \
1    True    True    True    True    True    True         3   
2   False   False   False   False   False   False       NaN   
3    True    True    True   False   False   False         1   

            ...                  Unnamed: 28  \
1           ...               Very favorably   
2           ...                          NaN   
3           ...             Unfamiliar (N/A)   

        Which character shot first?  \
1  I don't understand this question   
2                               NaN   
3  I don't understand this question   

  Are you familiar with the Expanded Universe?  \
1                                          Yes   
2                                          NaN   
3                                           No   

  Do you consider yourself to be a fan of the Expanded Universe?ξ  \
1                                                 No                 
2                                                NaN                 
3                                                NaN                 

  Do you consider yourself to be a fan of the Star Trek franchise? Gender  \
1                                                 No                 Male   
2                                                Yes                 Male   
3                                                 No                 Male   

     Age Household Income           Education Location (Census Region)  
1  18-29              NaN  High school degree           South Atlantic  
2  18-29     $0 - $24,999     Bachelor degree       West South Central  
3  18-29     $0 - $24,999  High school degree       West North Central  

[3 rows x 38 columns]
In [9]:
star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)

Encontrando la pelicula con un Ranking mayor.¶

In [10]:
star_wars[star_wars.columns[9:15]].mean()
Out[10]:
Ranking 1    3.732934
Ranking 2    4.087321
Ranking 3    4.341317
Ranking 4    3.272727
Ranking 5    2.513158
Ranking 6    3.047847
dtype: float64
In [11]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.bar(range(6), star_wars[star_wars.columns[9:15]].mean())
Out[11]:
  • En este momento, tenemos los datos limpios y tambien los rankings con sus medias. Gracias a esto podremos averiguar que pelicula tiene un ranking mas alto. Como podemos apreciar las peliculas mas viejas, 4 5 y 6 tienen una mejor clasificacion que llas primeras.

Encontrando la pelicula mas vista.¶

In [13]:
star_wars[star_wars.columns[3:9]].sum()
Out[13]:
Seen 1    673
Seen 2    571
Seen 3    550
Seen 4    607
Seen 5    758
Seen 6    738
dtype: int64
In [14]:
plt.bar(range(6), star_wars[star_wars.columns[3:9]].sum())
Out[14]:
  • Podemos apreciar que las pelicuas mas vistas son las mas viejas, tambien recordemos que las primeras fueron las que mas gustaron, como vimos previamente.

Explorando los datos por segmentos binarios.¶

In [17]:
male = star_wars[star_wars['Gender'] == 'Male']
female = star_wars[star_wars['Gender'] == 'Female']

plt.bar(range(6), male[male.columns[9:15]].mean())
plt.show()

plt.bar(range(6), female[female.columns[9:15]].mean())
plt.show()
In [19]:
plt.bar(range(6), male[male.columns[3:9]].sum())
plt.show()

plt.bar(range(6), male[male.columns[3:9]].sum())
plt.show()

Conclusión.¶

Podemos decir que tanto mujeres como hombres igualmente han visto las mismas peliculas por igual, es decir, no existe un sexo que se decante por una de las peliculas que por otra.

In [ ]:
 

Published

jun. 2, 2017

Category

Data cleaning

Tags

  • data visualization 5
  • matplotlib 5
  • python 10

Stay in Touch

Get Monthly Updates

  • Powered by Pelican. Theme: Elegant by Talha Mansoor